Heart Disease Risk Identification using Machine Learning Techniques for a Highly Imbalanced Dataset: a Comparative Study
نویسندگان
چکیده
Heart disease has become one of the most prevailing universal diseases in world today. It is estimated that 32% all deaths worldwide are caused due to heart diseases. One major causes for this its extremely difficult even medical practitioners predict as attacks it a complex task which requires great amount knowledge and experience. The number by hugely increased recent past. Machine learning popular areas computer science where many problems have been addressed successfully specially field medicine. In study we trained multiple supervised classifiers namely’; Naïve Bayes, LightGBM, Decision Trees, Random Forest, XGBoost, K Nearest Neighbours ADABoost compared accuracies identified what models perform better prediction. We used Behavioral Risk Factor Surveillance System (BRFSS) 2015 Disease Health Indicators Dataset was highly imbalanced order address class imbalance problem methods such Synthetic Minority Over Sampling Technique (Smote) Sampling, Adaptive Under TomekLink, SmoteTomek, Smoteen Cluster Centroid. According results obtained, can conclude hybrid SmoteTomek performed than other sampling methods.
منابع مشابه
Machine Learning Classification Techniques: A Comparative Study
Machine learning is the study of computer algorithms that improve automatically with experience. In other words it is the ability of the computer program to acquire or develop new knowledge or skills from examples for optimising the performance of a computer or a mobile device. In this paper we apply machine learning techniques Bayes network, Logistic Regression, Decision Stump, J48, Random For...
متن کاملMachine-Learning Techniques for Customer Retention: A Comparative Study
Nowadays, customers have become more interested in the quality of service (QoS) that organizations can provide them. Services provided by different vendors are not highly distinguished which increases competition between organizations to maintain and increase their QoS. Customer Relationship Management systems are used to enable organizations to acquire new customers, establish a continuous rel...
متن کاملDataset Editing Techniques: A Comparative Study
Editing techniques remove examples from datasets with the goal to obtain more accurate and faster classifiers. The objective of this paper is to compare several popular dataset editing techniques with respect to classification accuracy and training set compression rate including Wilson editing, Citation editing, and Multi-edit. Moreover, supervised clustering editing is introduced which replace...
متن کاملBankruptcy Prediction by Supervised Machine Learning Techniques : A Comparative Study
It is very important for financial institutions which are capable of accurately predicting business failure. In literature, numbers of bankruptcy prediction models have been developed based on statistical and machine learning techniques. In particular, many machine learning techniques, such as neural networks, decision trees, etc. have shown better prediction performances than statistical ones....
متن کاملClass-Boundary Alignment for Imbalanced Dataset Learning
In this paper, we propose the class-boundaryalignment algorithm to augment SVMs to deal with imbalanced training-data problems posed by many emerging applications (e.g., image retrieval, video surveillance, and gene profiling). Through a simple example, we first show that SVMs can be ineffective in determining the class boundary when the training instances of the target class are heavily outnum...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: KDU journal of multidisciplinary studies
سال: 2022
ISSN: ['2579-2245', '2579-2229']
DOI: https://doi.org/10.4038/kjms.v4i2.50